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Abstract 

Eukaryotic genomes are mosaics of genes acquired from their prokaryotic ancestors, the eubacterial endosymbiont that 
gave rise to the mitochondrion and its archaebacterial host. Genomic footprints of the prokaryotic merger at the origin of 
eukaryotes are still discernable in eukaryotic genomes, where gene expression and function correlate with their pro- 
karyotic ancestry. Molecular chaperones are essential in all domains of life as they assist the functional folding of their 
substrate proteins and protect the cell against the cytotoxic effects of protein misfolding. Eubacteria and archaebacteria 
code for slightly different chaperones, comprising distinct protein folding pathways. Here we study the evolution of the 
eukaryotic protein folding pathways following the endosymbiosis event. A phylogenetic analysis of all 64 chaperones 
encoded in the Saccharomyces cerevisiae genome revealed 25 chaperones of eubacterial ancestry, 1 1 of archaebacterial 
ancestry, TO of ambiguous prokaryotic ancestry, and 18 that may represent eukaryotic innovations. Several chaperone 
families (e.g., Hsp90 and Prefoldin) trace their ancestry to only one prokaryote group, while others, such as Hsp40 and 
Hsp70, are of mixed ancestry, with members contributed from both prokaryotic ancestors. Analysis of the yeast chap- 
erone-substrate interaction network revealed no preference for interaction between chaperones and substrates of the 
same origin. Our results suggest that the archaebacterial and eubacterial protein folding pathways have been reorganized 
and integrated into the present eukaryotic pathway. The highly integrated chaperone system of yeast is a manifestation 
of the central role of chaperone-mediated folding in maintaining cellular fitness. Most likely, both archaebacterial and 
eubacterial chaperone systems were essential at the very early stages of eukaryogenesis, and the retention of both may 
have offered new opportunities for expanding the scope of chaperone-mediated folding. 

Key words: origin of eukaryotes, molecular chaperones, protein evolution. 



Introduction 

The symbiogenic model for the origin of eukaryotes posits 
that eukaryotes arose via a symbiotic association of two dis- 
tantly related prokaryotes (Sagan 1967; Rivera and Lake 2004; 
Embley and Martin 2006; Pisani et al. 2007; Lane 2009; 
Alvarez-Ponce et al. 2013). Opinions about the precise taxo- 
nomic classification and metabolic capacities of the prokary- 
ote involved are still divided, however there is a wide 
agreement among scientists that the host was an archaebac- 
terium (Martin and Muller 1998; Cox et al. 2008; Williams 
et al. 2012) and the endosymbiont was an alpha-proteobac- 
terium (Gray et al. 1999; Gabaldon and Huynen 2003; Esser 
et al. 2004). The eubacterial endosymbiont subsequently 
evolved into the mitochondrion organelle, a process that 
was accompanied by a massive DNA transfer from the sym- 
biont into the host genome, the evolution of a mitochondrial 
protein import apparatus, a drastic miniaturization of the 
mitochondrial genome, and an increased complexity of the 
nuclear genome (Martin and Herrmann 1998; Martin 2003; 



Timmis et al. 2004). Phylogenomic studies show, accordingly, 
that eukaryotic genomes are a mosaic of genes of eubacterial 
and archaebacterial ancestry (Esser et al. 2004; Pisani et al. 
2007; Thiergart et al. 2012; Alvarez-Ponce et al. 2013). 

Evolutionary analysis of genes in the model eukaryote 
Saccharomyces cerevisiae reveals that about 37% of the 
genes can be traced back to either an archaebacterial or a 
eubacterial ancestor (Cotton and Mclnerney 2010). Thus, 
eukaryotic innovations probably account for a sizeable frac- 
tion of eukaryotic genomes. Yet, the proportion of eukaryotic 
genes of demonstrable prokaryotic origin is quite substantial 
considering the complications involved in this kind of analysis. 
The long divergence time elapsed since the symbiotic event 
limits our ability to detect prokaryotic homologs to some 
prokaryote-derived proteins and reduces the accuracy of phy- 
logenetic inference for others. Furthermore, lateral gene trans- 
fer events between the eubacterial and archaebacterial 
lineages (e.g., Deppenmeier et al. 2002; Large and Lund 
2009; Williams et al. 2010; Nelson-Sathi et al. 2012) may 



© The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. 
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http:// 
creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, 
provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 



Open Access 



410 



Mol. Biol. Evol. 31(2):410-418 doi:10.1093/molbev/mst212 Advance Access publication November 4, 2013 



Integration of Two Ancestral Chaperone Systems • doi:10.1093/molbev/mst212 



MBE 



have obscured the genetic record of the symbiosis event, 
leading to an ambiguous classification of eukaryotic genes. 

The chimerical origin of eukaryotic genomes is imprinted 
in the functional role of proteins within the cell. Many pro- 
teins that perform an informational function (e.g., replication, 
transcription, and translation) are of archaebacterial origin 
while many genes of eubacterial origin perform operational 
functions (e.g., metabolism, amino acid synthesis, and regula- 
tory genes) (Rivera et al. 1998; Esser et al. 2004; Cox et al. 2008; 
Cotton and Mclnemey 2010; Alvarez-Ponce and Mclnerney 
2011; Alvarez-Ponce et al. 2013). Eukaryotic genes of archae- 
bacterial origin are more essential regardless of the bias 
towards informational functions (Cotton and Mclnerney 
2010; Alvarez-Ponce and Mclnerney 2011). Furthermore, 
the eukaryotic protein-protein interaction network still 
bears the markings of a chimerical ancestry, with proteins 
from the same origin — archaebacterial or eubacterial — 
being interconnected at a frequency that is significantly 
above the expected by chance (Alvarez-Ponce and 
Mclnerney 2011). Thus, when considered as a whole, the 
eukaryotic proteome can be described as a partially integrated 
version of two ancestral ingredients. 

In this study, we have set forth to examine the evolution of 
the eukaryotic protein folding pathway in light of the symbio- 
genic model. Molecular chaperones are proteins that assist 
the folding and unfolding of other proteins, as well as the 
complex assembly and stabilization of protein and nucleic 
acids interactions (Hartl and Hayer-Hartl 2009; Large et al. 
2009). Chaperones often function in assembly-line-like path- 
ways where various chaperones interact consecutively with 
the same substrate driving the transition of the newly synthe- 
tized peptide into a functional protein (Young et al. 2004). 
Chaperones are essential in all living organisms and have been 
shown to play a role as capacitors of phenotypic variation 
(Rutherford and Lindquist 1998; Queitsch et al. 2002) and 
drivers of increased fitness within organisms facing a high 
mutational load (Fares et al. 2002; Maisnier-Patin et al. 
2005). Furthermore, their function as biochemical mediators 
of protein assembly played an important role in shaping ge- 
nomic landscapes (Bogumil and Dagan 2010; Williams and 
Fares 2010; Bogumil et al. 2012). The utility of molecular 
chaperones is thought to be constrained by a delicate balance 
between their help in mitigating the effects of protein 
misfolding and the slower rate of protein production and 
maturation of their substrate (Bogumil and Dagan 2012). 
Archaebacteria and eubacteria harbor slightly different 
repertoires of chaperone families. The Hsp40 and Hsp70 
chaperone families are present in both domains (Macario 
et al. 1991; Macario et al. 1993), whereas other chaperone 
systems, such as chaperonins, differ in their composition 
and assembly. 

Here we study the extent to which the chimeric origin of 
eukaryotes is detectable in the eukaryotic protein folding 
pathway of contemporary genomes. We infer the ancestry 
of yeast chaperones and their substrates, examine the yeast 
chaperone repertoire, and use a network approach to study 
the relationship between chaperones and their substrates in 
light of their origin. 



Results 

Prokaryotic Ancestry of S. cerevhiae Proteins 
To determine the prokaryotic origin of yeast proteins, we 
searched for their prokaryotic homologs among 82 archae- 
bacterial and 1,074 eubacterial genomes. A total of 1,230 yeast 
proteins had detectable homologs in one or more prokaryotic 
genomes. The remaining proteins did not manifest detectable 
homology with prokaryotic proteins, and we therefore 
consider them to be eukaryotic innovations. A total of 689 
phylogenetic trees were reconstructed for yeast proteins 
having more than three homologs belonging to both archae- 
bacteria and eubacteria. Yeast proteins were classified accord- 
ing to the prokaryotic domain within which they branch. 
Our analysis revealed 289 proteins of archaebacterial ancestry, 
803 of eubacterial ancestry, and 138 of an unresolved 
prokaryotic ancestry. All phylogenetic trees are provided in 
supplementary tables S1 and S2, Supplementary Material 
online. 

The Mosaic Structure of the S. cerevisiae Chaperone 
Repertoire 

Of the 64 known yeast molecular chaperones, 46 had 
homologs in prokaryotic genomes. These were classified 
based on their tree topology into 11 chaperones of archae- 
bacterial ancestry and 25 chaperones of eubacterial ancestry. 
The ancestry of the remaining ten chaperones could not be 
resolved from the data (fig. 1). The Hsp90 family in yeast 
includes two paralogs whose sequences are highly similar 
(96% identity at the amino acid level). Both paralogs are ho- 
mologous to eubacterial htpG sequences exclusively, and 
hence the yeast Hsp90 is clearly of eubacterial origin. 
The prefoldin (PFD) chaperones transfer target proteins to 
the chaperone-containing T-complex polypeptide 1 (CCT) 
system for further folding (Vainberg et al. 1998). The yeast 
genome encodes six PFD paralogs whose protein sequences 
are 15.2 ± 3.8% identical. Three of the six PFDs have homologs 
in prokaryotic genomes, all of which are archaebacterial. The 
remaining three paralogs had no detectable homologs in pro- 
karyotic genomes applying the sequence similarity threshold 
used in this study (>25% identical amino acids). This indi- 
cates that PFD is an archaebacterial contribution to eukary- 
otic genomes, and the family further diversified within 
eukaryotes. All five small heat shock proteins (sHsp) were 
inferred to be of eubacterial ancestry. Hsp26 is homologous 
to eubacterial sequences only, and the four paralogous genes 
Hsp31, Hsp32, Hsp33, and Sno4 clearly branch within the 
eubacterial clade, although homologs in halophilic and 
methanogenic archaebacteria were found as well. Members 
of the HsplOO chaperone family (Clp) play a role in protein 
disaggregation (Parsell et al. 1994). Of the three Hsp100 pro- 
teins in yeast, one is localized in the mitochondria and two 
are cytosolic (van Dyck et al. 1998). The mitochondrial Clp 
protein Mcx1 was inferred to be of eubacterial origin. The 
cytosolic Hsp104 was inferred to have been derived from 
an archaebacterial AAA+ ATPase, while the second cytosolic 
Hsp78 is of ambiguous ancestry. The Hsp40 and 
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Fig. 1. Yeast chaperones and their reconstructed ancestries. Archaebacterial ancestry is shown in red and eubacterial ancestry in blue. Chaperones with 
ambiguous ancestry or no homology to prokaryotic proteins are colored in purple and gray, respectively. Here we use the same structural model for all 
members of the same family; Note that paralogs may deviate in their protein structures. Molecule plots were generated using the PyMOL Molecular 
Graphics System, version 1.5.0.4 (Schrodinger, LLC). 



Hsp70 families include chaperones with eubacterial as well as 
archaebacterial ancestry, although the majority of chaperones 
from these particular families are of eubacterial descent. 

Eukaryotic genomes typically encode two chaperonin 
systems: the type I mitochondrial Hsp60/Hsp10 system 
(GroEL/ES-like) and the type II chaperonin (CCT-like). The 



type I chaperonin system is usually viewed as a eubacterial 
set of chaperones; however, it is also encoded in the genomes 
of several methanogenic and halophilic archaebacteria 
(e.g., Deppenmeier et al. 2002). The yeast Hsp60 
branched in between a purely archaebacterial clade and a 
purely eubacterial clade. Consequently, it was classified as of 
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ambiguous prokaryotic ancestry. The cochaperone Hsp10 is 
clearly of eubacterial origin. This classification fits well with its 
localization in the mitochondrion. The type II eukaryotic cha- 
peronins comprise eight different protein subunits (Archibald 
et al. 1999; Valpuesta et al. 2002). These chaperones are usu- 
ally viewed as archaebacterial; however, several Clostridia spe- 
cies encode type II chaperonins as well (Techtmann and Robb 
2010; Williams et al. 2010). An archaebacterial ancestry was 
inferred for Tcp1 and a eubacterial origin was inferred for Cct4 
and Cct8. The other five CCT genes were classified as ambig- 
uous as they branch between clostridial and archaebacterial 
homologs. 

Connectivity in the Chaperone Interaction Network 
and Protein Ancestry 

The chaperone-substrate interaction (CSI) network is based 
on a large-scale screening for proteins that interact with 64 
chaperones encoded in Saccharomyces cerevisae (Cong et al. 
2009). The CSI network contains 4,340 substrate proteins that 
interact with at least one chaperone and a total of 21,428 CSIs. 
Interactions in the CSI network are unweighted and do not 
reflect their relative prevalence. We reduced the data set to 
include only those chaperones and substrates for which pro- 
karyotic ancestry could be determined. This network con- 
tained 36 chaperones and 790 substrates. A total of 3,058 
interactions included in the network were classified into 
four classes based on the ancestry of both the chaperones 
and substrates (inset in fig. 2). 

The network connectivity pattern is not biased toward 
a higher number of interactions between chaperones and 
substrates of the same ancestry (/ 2 test; P = 0.52, inset in 
fig. 2). This type of network data may sometimes be biased 
by nodes having extreme connectivity or gene expression 
levels. To guard against such a possibility, we classified both 
chaperones and substrates into high/low categories according 
to the following properties: network connectivity degree, 



Substrate 



o 



358 


922 


516 


1262 



Ab 



Bb 



0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 

Fig. 2. Prokaryotic origin and connectivity distribution. Asterisks indi- 
cate the observed percentage of edges in the network, and bars show 
the mean expected frequency from randomization simulations. Lines 
indicate the 1-99 percentile range. Abbreviations: A, archaebacterial, B, 
eubacterial; uppercase indicates chaperones and lowercase indicates 
substrates. 



mRNA expression, and protein expression. We repeated the 
analysis with subsets of the network defined by these con- 
trasts and observed the same pattern as in the full network, 
indicating that the result is robust (see supplementary table 
S3, Supplementary Material online). Moreover, this conclu- 
sion still holds when considering only substrates that interact 
with at least two chaperones or more (/ 2 test; P = 0.49). 
Although the mean connectivity degree of substrates of 
archaebacterial ancestry (5.33) is higher than that of sub- 
strates of eubacterial ancestry (4.81), this difference is not 
statistically significant (Wilcoxon rank-sum test, P = 0.07). 
To further test for possible biases in the network connectivity 
pattern, we examined the ratio of eubacterial to archaebac- 
terial interaction partners for each chaperone and substrate, 
and tested for differences in the distributions of ratios in the 
two ancestry groups. We found no significant difference in 
the distributions of the chaperone ancestry ratio between 
archaeal and eubacterial substrates (Wilcoxon rank-sum 
test, P = 0.62), and no significant difference in the distribu- 
tions of the substrate ancestry ratio between archaeal and 
eubacterial chaperones (Wilcoxon rank-sum test, P = 0.18). 
We further tested whether any of the four chaperone-sub- 
strate ancestry combinations is enriched in the network by 
conducting a network randomization test with 10,000 ran- 
domization replicates (fig. 2). This analysis shows that none of 
the four interaction types is found at a frequency that is 
significantly different from the random expectation (at a 
false discovery rate [FDR] of 0.01). 

Protein Ancestry and Protein Function 
Substrates in the network were further classified into two 
major functional categories according to their annotation in 
the Gene Ontology database (GO, Ashburner et al. 2000). 
Substrates whose annotation includes the terms "translation," 
"transcription," "DNA-dependent DNA replication," or their 
subterms were classified as proteins performing an informa- 
tional function. The remaining substrates were classified as 
operational proteins (Rivera et al. 1998; Cotton and 
Mclnerney 2010). Combining the functional classification 
with prokaryotic ancestry reconstruction revealed that 59% 
of the 216 archaebacterial substrates and 15% of the 528 
eubacterial substrates found in GO perform informational 
functions. Hence, substrates of archaebacterial origin are 
enriched for informational functions (P < 10~ 16 , using / 2 
test), confirming the known correlations between prokaryotic 
ancestry and protein function (Esser et al. 2004; Cotton and 
Mclnerney 2010; Alvarez-Ponce and Mclnerney 201 1; Alvarez- 
Ponce et al. 2013). In addition, we found that informational 
substrates interact with a larger number of different chaper- 
ones than operational substrates (Wilcoxon rank-sum test, 
P< 10~ 16 ). 

Prokaryotic Ancestry and Protein Physicochemical 
Properties 

A comparison of protein physicochemical properties between 
the two ancestry groups revealed several significant differ- 
ences. The differences are manifest in proteins that interact 
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with chaperones as well as in proteins that are chaperon 
independent. Interestingly, the differences observed in chap- 
erone independent proteins are significantly larger than those 
in the chaperone substrates (fig. 3). 

Eubacterial substrates were found to be longer on aver- 
age, in agreement with previous studies (Alvarez-Ponce 
and Mclnerney 2011). In addition, eubacterial substrates are 
also enriched in hydrophobic and aromatic amino acids 
in comparison to archaebacterial substrates. Archaebacterial 
substrates are more conserved, more highly expressed, 
and are encoded by higher proportions of preferred codons 
than eubacterial substrates (fig. 3). Biases in the three latter 



properties fit well with the known correlation among 
evolutionary rates, expression level, and codon usage bias 
(Grantham et al. 1981; Sharp and Li 1987; Pal et al. 2001; 
Drummond et al. 2005; Pal et al. 2006). In addition, substrates 
of archaebacterial origin were enriched for positively charged 
amino acids as well as arginine, lysine, and valine. On the 
other hand, substrates of eubacterial origin are significantly 
enriched in cysteine, histidine, isoleucine, leucine, phenyl- 
alanine, proline, serine, and tryptophane (fig. 3). 

Most of the above differences in substrate physico- 
chemical properties are observed when contrasting informa- 
tional and operational proteins, as expected from the 
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Fig. 3. Differences in protein physicochemical properties between proteins of eubacterial and archaebacterial origin. Enrichment in proteins of 
eubacterial origin is on the left and shown as blue shades and that of proteins of archaebacterial origin on the right and shown as red 
shades. Chaperone substrates are in dark shades and proteins not connected to chaperones are in light shades. Asterisks denote statistical 
significance (Kolmogorov-Smirnov tests); * denotes 5% FDR and ** 1% FDR; Asterisks to the left of slash refers to tests contrasting protein ancestries 
and asterisks to the right of slash refers to tests contrasting substrates with chaperone-independent proteins. Bar lengths indicate the enrichment 
ratio in log 10 scale. 
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congruence between the ancestral and functional 
classifications. 

Discussion 

Our evolutionary reconstruction of the ancestry of chaper- 
ones involved in the yeast protein, folding pathway reveals 
that chaperones of different descent are used in a coordinated 
fashion to fold common substrates. For example, the Hsp40/ 
Hsp70 system in yeast comprises a total of 21 Hsp40 and 
14 Hsp70 genes from diverse origins including archaebacterial, 
eubacterial, and eukaryotic-specific proteins (ESPs). 
Interestingly, the Hsp40 family, with 11 ESPs, has diversified 
within eukaryotes to a larger extent in comparison to the 
Hsp70 family that includes only one ESP. The difference 
between the two families can be explained by their mode 
of function. Chaperones of the Hsp40 family are the drivers 
of Hsp70 substrate activity and specificity (Cyr and Douglas 
1994; Kampinga and Craig 2010). Thus, the diversification of 
Hsp40 family within eukaryotes probably enabled the whole 
Hsp40-Hsp70 system to increase its operational potential. 
A mosaic of ancestries is observed in all chaperone families 
that are present in both archaebacteria and eubacteria. 
It is noteworthy that in contrast to cytosolic chaperones, 
yeast chaperones that are localized in the mitochondria 
are an exception. All mitochondrial chaperones that could 
be classified by their tree topology are inferred to be of 
eubacterial ancestry, underlining the role of the mitochon- 
drion as a functional eubacterial unit within the eukaryotic 
cell (Esser et al. 2004). 

Previous studies showed that there is a significant prefer- 
ence for proteins to interact with partners of the same an- 
cestry rather than across the archaebacterial-eubacterial 
divide (Alvarez-Ponce and Mclnerney 2011). Such preference 
can be expected if the proteins participating in specific cellu- 
lar pathways are usually of a single ancestry. Because protein 
connectivity is higher within pathways than across pathways, 
common ancestry of pathway proteins will result in an overall 
trend for same ancestry interactions. Thus, same ancestry 
preference, while demonstrable on average, may still be vio- 
lated when considering specific systems. Our results suggest 
that the general trend does not hold for the CSI network, 
where no preference for interaction of chaperones and sub- 
strates of the same ancestry could be observed. This indicates 
that the protein folding pathways have been reorganized and 
integrated to a larger extent in comparison to the overall 
protein-protein interactions within the cell. 

Yeast proteins originating from the two endosymbiosis 
partners are distinct in their physicochemical properties pro- 
file (Alvarez-Ponce and Mclnerney 2011). These differences, 
while still significant, are much smaller among proteins that 
utilize chaperones in their folding pathway than among chap- 
erone-independent proteins. The molecular features that 
enable substrates to interact with chaperones, while not yet 
well understood, are likely to place constraints on the various 
physicochemical properties and can thus result in greater 
similarity of chaperone substrates when compared with 
other proteins. Moreover, adaptation to chaperone assisted 
folding is likely to affect these same features, actively driving 



substrates away from their ancestral profile and toward a 
common eukaryotic profile. For example, archaebacterial sub- 
strates are expressed in significantly higher levels than eubac- 
terial substrates, yet these differences are still significantly 
smaller than those observed for proteins that do not interact 
with chaperones. In the crowded cell environment, successful 
competition for chaperones is likely to be linked to expression 
levels, thus putting eubacterial proteins in a disadvantage. 
Thus, a narrower expression range for substrate proteins 
may provide a competition field that is more balanced. This 
can be seen as a homogenizing effect of chaperones on their 
substrates, and from this perspective, chaperones can be 
viewed as inducers of the eukaryotic integrated state. Thus, 
chaperones have a cumulative impact on eukaryotic genome 
evolution. 

What makes molecular chaperones a class of proteins that 
is more amenable to integration? Chaperones are highly ver- 
satile proteins that increase the probability of their substrates 
to attain a functional conformation and by that can contrib- 
ute significantly to the organismal fitness. Chaperones are 
essential in both prokaryotic domains (Hard and Hayer- 
Hartl 2002; Calloni et al. 2012); hence, at the very origin of 
the eukaryotic cytosol, there was an absolute need for chap- 
erones of both ancestries to assist in the folding of their re- 
spective substrates. Some molecular chaperones are very 
versatile, and in vitro they can assist folding of substrates 
from unrelated organisms, even from another prokaryotic 
domain (e.g., Yam et al. 2008). Moreover, similar chaperones 
may have similar substrate specificity and interact with similar 
sets of proteins. Therefore, eubacterial and archaebacterial 
chaperones might have had overlapping substrate sets at 
the initial steps of eukaryogenesis. In vivo, however, this 
capacity may not be sufficient, as the organism must sustain 
a balanced stoichiometric and energetic profile, which re- 
quires chaperones and substrate expression to be coordinated 
by a common regulatory regime. Thus, a nonspecific interac- 
tion pattern allows chaperones to acquire new clients with- 
out the need for intensive sequence modification or 
adaptation, and the evolution of a completely integrated 
system is expected to also include the regulatory context 
governing coexpression of chaperones and their substrates 
as well as optimizing the competitive binding of substrates 
and their dedicated chaperones. 

The effects of combining two ancestral chaperone systems 
may have conferred an even larger fitness benefit than was 
possible by either of the ancestral systems on its own. 
Moreover, the apparent redundancy in the chaperone reper- 
toire may reflect not only the demands of protein folding 
pathways but the possibility that some chaperones are 
involved in other functions. Chaperones have been reported 
to posses such moonlighting functions (e.g., Wuppermann 
et al. 2008, see Henderson et al. 2013 for a review). 
Moonlighting may also explain the expansion and diversifica- 
tion observed in several of the larger eukaryotic chaperone 
families. 

Nonetheless, retaining two chaperone systems would have 
entailed an additional energetic cost for the cell as chaperone 
synthesis and operation is expensive in terms of ATP usage. 
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In the context of eukaryogenesis, this would not have posed 
an insurmountable problem, since the formation of mito- 
chondria as an intracellular organelle resulted in a dramatic 
increase in the available energy for all cellular processes (Lane 
and Martin 2010). Nevertheless, energetic considerations 
might still play a role in the evolution of CSIs (Bogumil and 
Dagan 2012). 

In summary, in contrast with other proteins that still show 
a tendency to form network communities reflecting their 
ancestries, molecular chaperones have been able to cross 
the divide between the ancestral prokaryotic domains. The 
central role of chaperone-assisted folding in maintaining 
cellular fitness is reflected in the high degree of integration 
of an archaebacterial and a eubacterial chaperone systems 
into one at the origin of eukaryotes. 

Materials and Methods 

Data 

Yeast protein sequences, amino acid usage data, functional 
assignments, chromosomal locations, frequencies of optimal 
codons, codon adaptation index, gravy scores (hydropathy 
index), and aromaticity scores were downloaded from the 
Saccharomyces Genome Database (Cherry et al. 1998). 
Chaperone-protein interaction data were obtained from 
Gong et al. (2009). The secondary structure of all proteins 
was inferred using PsiPred (Jones 1999), applying a threshold 
of 70% for the calculation of secondary structure probability. 
Quantitative protein expression data were obtained from 
Ghaemmaghami et al. (2003). The mRNA levels data were 
obtained from Wang et al. (2002). For the statistical analysis 
of protein expression levels, natural log transformation was 
applied. Proteins for which expression levels were not avail- 
able (107 in total) or with zero expression level (1,665 pro- 
teins) were excluded from the analysis. All statistical analyses 
were performed using the MatLab Statistics toolbox. 

Evolutionary Rate 

Positional orthology assignments among 20 fungal genomes 
were obtained from Wapinski et al. (2007). Proteins lacking 
orthologs in any genome (282 in total) were excluded 
from the analysis. Multiple sequence alignments of all yeast 
open reading frames with orthologous sequences were 
generated with MAFFT (Katoh et al. 2005). Phylogenetic 
trees were reconstructed with PhyML v3.0_360-500 M 
(Guindon and Gascuel 2003) using the best-fit model as 
inferred by ProtTest 3 (Darriba et al. 2011) according to the 
Akaike information criterion measure (Akaike 1974). 
Distances from the S. cerevisiae proteins to their orthologs 
were calculated as the sum of branch lengths. To calculate 
the relative amino acid substitution rates of substrates, 
the distances to the 20 proteomes were first Z-transformed 
separately and then averaged over all orthologs (Bogumil and 
Dagan 2010). 

Reconstruction of Prokaryotic Ancestries 

We classified each of the 5,880 yeast protein-coding genes 

into archaebacterial, eubacterial, ambiguous prokaryotic 



ancestry, or eukaryote-specific, based on its phylogenetic 
affinities. Each yeast protein sequence was used as a query 
in a homology search against a database containing the pro- 
teomes of 82 archaebacteria and 1,074 eubacteria (3,792,506 
proteins in total). Homology searches were carried out using 
position specific iterated-basic local alignment search tool 
(PSI-BLAST) (Altschul et al. 1997) without filtering. Global 
pairwise alignments of BLAST-hits were calculated using the 
EMBOSS package (Needleman and Wunsch 1970; Rice et al. 
2000). Prokaryotic sequences with less than 25% identity were 
considered as having no significant similarity to the particular 
yeast query. Of the yeast genes, 161 had significant similarity 
to archaebacterial sequences exclusively (and were thus clas- 
sified as being of archaebacterial ancestry), 383 had significant 
similarity to eubacterial sequences only (and were thus 
deemed as eubacterium-derived), and 686 had homologs in 
both prokaryotic domains. The remaining genes had no de- 
tectable prokaryotic homologs at the specified thresholds and 
were thus considered eukaryote-specific. 

To ascertain the ancestry of the 686 yeast genes with both 
archaebacterial and eubacterial homologs, we conducted 
a phylogenetic analysis. For each of these genes, a multiple 
sequence alignment including the 15 best BLAST hits from 
each prokaryotic domain was generated using MAFFT 
v6.843 b (Katoh and Toh 2008), and the quality of the align- 
ment was tested with guidance (Penn et al. 2010). To be 
conservative in our analysis, columns with a confidence 
score <0.93 were removed. Phylogenetic trees were recon- 
structed with PhyML v3.0_360-500 M (Guindon and Gascuel 
2003) using the best-fit model as inferred by ProtTest 3 
(Darriba et al. 2011) according to the Akaike information 
criterion (Akaike 1974). 

We next rooted each tree on the branch that maximized 
the separation of archaebacterial and eubacterial sequences. 
The internal branch yielding the maximum ratio of archae- 
bacteria to eubacteria content in the resulting clades was 
determined with the MRP function implemented in 
CLANN 3.2.2 (Creevey and Mclnerney 2005) using 
Spearman's rank correlation coefficient. The yeast gene was 
classified as of eubacterial or archaebacterial ancestry depend- 
ing on the clade within which it branched (see supplementary 
fig. SI, Supplementary Material online, for illustrative trees). 
Yeast genes were considered of ambiguous ancestry if no 
branch yielded a clear separation into an archaebacterial 
and eubacterial clades, if multiple branches separated the 
archaebacterial and eubacterial sequences equally well and 
resulted in conflicting ancestry assignments, or if the yeast 
gene branched between the archaebacterial and eubacterial 
clades. In such ambiguous cases, we repeated the analysis with 
a larger sample of homologous sequences, first with the 30 
best BLAST hits from each domain, and if still ambiguous, 
with the 45 best BLAST hits from each domain. This analysis 
shifted 125 genes from the ambiguous to the unambiguous 
class. Of the 686 yeast genes with both archaebacterial and 
eubacterial homologs, 128 were classified as of archaebacterial 
ancestry, 420 as of eubacterial ancestry, and 138 as ambigu- 
ous. All phylogenetic trees are provided in supplementary 
tables S1 and S2, Supplementary Material online. 
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In total, we inferred 289 proteins to be of archaebacterial 
ancestry, 803 of eubacterial ancestry, and 138 proteins with an 
unresolvable prokaryotic ancestry. The remaining yeast pro- 
teins did not show significant similarity with any prokaryotic 
protein. 

Network Randomization 

Randomization of the CSI network was carried out using the 
switching methodology (Stone and Roberts 1990; Artzy- 
Randrup and Stone 2005) implemented in an in-house 
MatLab script. 

Supplementary Material 

Supplementary figure S1 and tables S1-S3 are available at 
Molecular Biology and Evolution online (http://www.mbe. 
oxfordjournals.org/). 
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